Initial implementation #1

superdosh · 2025-05-22T14:41:34Z

Initial implementation for a workflow someone developing a new evaluator could use where experiment tracking is managed in mlflow.

I tried to explain what's happening in the README.md, and the best way to see it is via the template jupyter notebook.

Opening as draft for feedback!

github-actions · 2025-05-22T14:41:47Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

superdosh · 2025-05-22T14:45:26Z

Apologies for the noise, had the wrong email on my git config, so the CLA check failed. Fixed now.

bkorycki

This looks great! I haven't tried playing with it yet but the readme makes it seem simple to use which is awesome. I also think that in the future it would be very valuable to also be able to run non-local dvc datasets.
One thing I think we should maybe think about now is standardizing the experiment IDs. This would enable users to more easily dig through a large list of past runs. For example, the id could be automatically constructed from the sut ids/annotator ids/dataset name, and an optional tag. @bollacker may have more thoughts here.
Other than that I don't have any major notes! Looks like a solid first implementation to me.:)

bkorycki · 2025-05-27T19:15:41Z

src/modelplane/runways/responder.py

+    help="The number of jobs to run in parallel. Defaults to 1.",
+)
+@load_from_dotenv
+def get_responses(


This is a nit-pick but can you re-name the response/responder stuff to be something more specific to suts? Because "response" isn't a term unique to suts imho.

Definitely! Is there a term we use elsewhere that would make sense to re-use here? Or could just be get_sut_responses ?

I think get_sut_responses is good:)

superdosh · 2025-05-27T20:10:52Z

One thing I think we should maybe think about now is standardizing the experiment IDs. This would enable users to more easily dig through a large list of past runs. For example, the id could be automatically constructed from the sut ids/annotator ids/dataset name, and an optional tag.

@bkorycki , I like this idea, though I worry if we use all of those things to construct the experiment name, it'll be too long. One thing is that if we're good about the tagging, that should enable the nice searching. Though we're currently missing the sut_id tag from the annotator run; I'll add that.

I'll add a note on this to the README under the TODOs!

superdosh · 2025-05-28T17:36:40Z

@bollacker merging so we can branch off of main for the next steps, but please still comment, if you like!

superdosh self-assigned this May 22, 2025

superdosh force-pushed the initial-implementation branch from 2700fe9 to d40f200 Compare May 22, 2025 14:42

Initial working version.

80a1cb0

superdosh force-pushed the initial-implementation branch from d40f200 to 80a1cb0 Compare May 22, 2025 14:44

superdosh requested review from bkorycki, bollacker and wpietri May 27, 2025 16:01

superdosh marked this pull request as ready for review May 27, 2025 19:04

superdosh requested a review from a team as a code owner May 27, 2025 19:04

bkorycki reviewed May 27, 2025

View reviewed changes

superdosh added 3 commits May 28, 2025 07:42

Fix path to sample data.

44ab577

Add sample prompt file.

2f311da

Update TODO list and rename responder function

0b9c810

superdosh mentioned this pull request May 28, 2025

Automated experiment names #10

Open

Update Jupyter Server URL with default token parameter in README.md.

8167e62

bkorycki approved these changes May 28, 2025

View reviewed changes

superdosh merged commit 576e5cc into main May 28, 2025
1 check passed

github-actions bot locked and limited conversation to collaborators May 28, 2025

superdosh deleted the initial-implementation branch May 30, 2025 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial implementation #1

Initial implementation #1

Uh oh!

superdosh commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025 •

edited

Loading

Uh oh!

superdosh commented May 22, 2025

Uh oh!

bkorycki left a comment

Uh oh!

bkorycki May 27, 2025

Uh oh!

superdosh May 27, 2025

Uh oh!

bkorycki May 27, 2025

Uh oh!

superdosh commented May 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

superdosh commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Initial implementation #1

Initial implementation #1

Uh oh!

Conversation

superdosh commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

superdosh commented May 22, 2025

Uh oh!

bkorycki left a comment

Choose a reason for hiding this comment

Uh oh!

bkorycki May 27, 2025

Choose a reason for hiding this comment

Uh oh!

superdosh May 27, 2025

Choose a reason for hiding this comment

Uh oh!

bkorycki May 27, 2025

Choose a reason for hiding this comment

Uh oh!

superdosh commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

superdosh commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented May 22, 2025 •

edited

Loading

superdosh commented May 27, 2025 •

edited

Loading